I3rab: A New Arabic Dependency Treebank Based on Arabic Grammatical Theory

نویسندگان

چکیده

Treebanks are valuable linguistic resources that include the syntactic structure of a language sentence in addition to part-of-speech tags and morphological features. They mainly utilized modeling statistical parsers. Although natural parser has recently become more accurate for languages such as English, those Arabic still have low accuracy. The purpose this article is construct new dependency treebank based on traditional grammatical theory characteristics language, investigate their effects accuracy proposed treebank, called I3rab, contrasts with existing treebanks two main concepts. first concept approach determining word sentence, second representation joined covert pronouns. To evaluate we compared its performance against subset Prague Dependency Treebank shares comparable level details. conducted experiments show percentage improvement reached up 10.24% UAS 18.42% LAS.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Treebank-Based Acquisition of Arabic LFG Dependency Structures

A number of papers have reported on methods for the automatic acquisition of large-scale, probabilistic LFG-based grammatical resources from treebanks for English (Cahill and al., 2002), (Cahill and al., 2004), German (Cahill and al., 2003), Chinese (Burke, 2004), (Guo and al., 2007), Spanish (O’Donovan, 2004), (Chrupala and van Genabith, 2006) and French (Schluter and van Genabith, 2008). Here...

متن کامل

Tips and Tricks of the Prague Arabic Dependency Treebank

In this paper, we report on several software implementations that we have developed within Prague Arabic Dependency Treebank or some other projects concerned with Arabic Natural Language Processing. We try to guide the reader through some essential tasks and note the solutions that we have designed and used. We as well point to third-party computational systems that the research community might...

متن کامل

Syntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank

The Quranic Arabic Dependency Treebank (QADT) is part of the Quranic Arabic Corpus (http://corpus.quran.com), an online linguistic resource organized by the University of Leeds, and developed through online collaborative annotation. The website has become a popular study resource for Arabic and the Quran, and is now used by over 1,500 researchers and students daily. This paper presents the tree...

متن کامل

DCU 250 Arabic Dependency Bank: An LFG Gold Standard Resource for the Arabic Penn Treebank

This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is neces...

متن کامل

Information Structure with the Prague Arabic Dependency Treebank

The issue of information structure in language has been studied extensively both in the Prague School of Linguistics (Mathesius, 1929) and in the Functional Generative Description (FGD), one of the modern theories of representation of linguistic meaning (Sgall, 1967; Sgall et al., 1986; Hajičová and Sgall, 2003, 2004). In its entirety, FGD constitutes the framework for a family of projects in c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2021

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3472295